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Abstract 

Multi-fractal patterns occur widely in nature. In developing new 
algorithms to determine multi-fractal spectra of experimental data I 
am lead to the conclusion that generalised dimensions D q of order 
q < 0, including the Hausdorff dimension, are effectively irrelevant. 
The reason is that these dimensions are extraordinarily sensitive to 
regions of low density in the multi-fractal data. Instead, one should 
concentrate attention on generalised dimensions D q for q > 1, and of 
these the information dimension D\ seems the most robustly estimated 
from a finite amount of data. 
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1 Introduction 

The characterisation of spatial distributions in terms of fractal concepts [T2*l 
Hi] is becoming increasingly important. In particular, many distributions 
in nature are found to have the characteristics of a multi-fractal [HI 13 Ej : 
among many examples are galaxy clustering , strange attractors JH] , 

fluid turbulence [Ej, percolation [Zj, the shapes of neurons [HJIBj, and plant 
distributions [2] and shapes [TP] . 

In application, methods for estimating fractal dimensions are often unre- 
liable. One source of error lies in largely unknown biases introduced by the 
finite size of data sets, addressed by Grassberger [4., and in the associated 
finite range of length-scales inherent in gathered data. In situations where 
thousands or tens of thousands of data points are known such biases may 
be minor; however, in some interesting problems, for example in the spatial 
clustering of underwater plants |2j, only of the order of 100 data points are 
known and confidence in the fractal characterisation may be misplaced. We 
need to know more about factors that cause errors in dimension estimates. 

Section El discusses the sensitivity of the multiplicative multi- fractal pro- 
cess to regions of very low probability (measure). Since such regions only 
rarely contribute a data point, an experimental sample cannot discern them 
but such regions do affect the generalised dimensions. Hence I argue that 
the determination from experimental data of generalised dimensions, D q , for 
non-positive q is meaningless; for < q < 1 computations are very sensitive 
to the sample; and thus the most robust fractal dimension is the information 
dimension D\. The argument is supported in Section 01 by a maximum like- 
lihood method ^H] of estimating the multi-fractal properties of a data set. 
The method shows the enormous sensitivity of D q for negative q. In contrast 
the information dimension is reliably estimated. 

2 Poor conditioning of generalised dimensions 
of negative order 

For example, consider the Hausdorff dimension, Dq, of multifractals gener- 
ated by two different ternary multiplicative process. 

• Consider first the process shown in Figure IHa) where an interval is 
divided into three thirds and the "mass" of the original interval is 
assigned as follows: a fraction fi > to the left third; a fraction 
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Figure 1: schematic diagram of the first few stages in the multiplicative multi- 
fractal process to illustrate the sensitivity of the Hausdorff dimension Do with 
respect to low density regions, (b), as a perturbation of the same process with 
zero density regions, (a). 

/2 = 1 — ft > to the right third; and none to the middle third. 
Repeat this subdivision recursively. This generates a multiplicative 
multifractal whose Hausdorff dimension of Do = log 3 2 = 0.6309 is 
precisely the same as the Cantor set because there is no "mass" in the 
middle thirds. 

• Conversely, and perversely, consider the process shown in Figure ^b) 
where for some small e the "mass" is assigned as follows: a fraction 
fi > is assigned to the left third; a fraction f 2 > is assigned 
to the rightmost third; and a small fraction e > is assigned to the 
middle third (such that fi + f2 + e = 1). Repeat recursively. This 
generates a multiplicative multi-fractal whose Hausdorff dimension is 
D = 1 because there is "mass" everywhere along the whole interval! 
Although the vast bulk of the "mass" can be covered by 2 n intervals of 
length 3~ n , we definitely do need 3 n intervals in order to ensure coverage 
of the thinly spread "mass" that fills most of the original interval. 

The importance of this for the analysis of an experimental data set of N sam- 
pled points is that one cannot tell the difference from the data between these 
two multi-fractal generating processes for an e = o(l/N). Thus one cannot 
estimate the Hausdorff dimension D with any accuracy since either answer, 
0.6309 or 1 could be correct. 
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Similar reasoning applies to generalised dimensions with negative q. El- 
ementary arguments give that the generalised dimensions [S] of the multi- 
fractal generated by the second process above are 

D q = { ^! og3[/ ? +/ !t e9] „ . m 



[fi log 3 fi + fa log 3 fa + e log 3 e] if q = 1 



It is readily appreciated that for negative order g and small e, the term e q in- 
side the logarithm dominates the evaluation of the generalised dimension D q . 
Hence, all generalised dimensions for negative q are also extremely sensitive 
to small e. In a data set obtained from experiments, one cannot expect to 
distinguish between zero e and small non-zero e = o(l/N), and yet the gen- 
eralised exponents and multi- fractal spectrum are markedly different. See 
Figure El which plots the generalised dimensions for f\ « 1/4, fa ~ 3/4 and 
various small e. 

We can be more precise about the sensitivity to low density regions by 
computing the derivative of D q with respect to e. For defmiteness, suppose 
fi = 0i(l - e) and fa = 2 (1 - e). Then 



dD q _ -q ei- 1 - (g + 01) (1 - ey- 1 
de q - 1 log 3 [ei + (0f + 4>\) (1 - e)«] ' 

For small, but non-zero, e —>■ this asymptotes to 



(2) 



i if 1 < g, 

J raW £g ifO<^<l, (3) 



<9e log 3 



if g < . 



This derivative is unbounded as e — > for g < 1, and so any computation 
of D g is only robust if g > 1. 

The reason for this aberrant behaviour is clear. With a finite number 
of data points, it is impossible to tell the difference between truly empty 
space and space which is visited so rarely that no data point happens to fall 
within it. That is, one cannot tell the difference between empty space and 
space that should be filled in with very low probability. These differences 
dramatically affect the generalised dimensions D q for q < 1. Thus for any 
experimental data set: 



• estimating D q for q < is nonsense (including the Hausdorff dimen- 
sion) ; 

• estimates of D q for small positive q are sensitive; and 
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Figure 2: multi-fractal generalised dimensions D q for the ternary multi- 
fractal process with fi = (1 — e)/4, f 2 = (1 — e) 3/4 and e = (solid), 
0.01 (dashed) and 0.05 (dotted). This figure shows that D q for negative or- 
der q is extraordinarily sensitive to small influences: the curve of smaller e is 
the most changed. 

• I only recommend the reporting of dimensions D q for q > 1 as being 
robust. 

Out of all the generalised dimensions for order q > 1, Di is most represen- 
tative of the fractal as a whole. For large order q, the computation of D q is 
determined only by the very "densest" regions of the multi-fractal and so is 
not representative of the whole fractal. In the above multiplicative process, 

D q ~ -log 3 max(/i,/ 2 ,e) as q -»• oo , 

showing that the large q behaviour is dictated by the one parameter of the 
process that determines the character of the very densest clusters in the frac- 
tal. The very dense clusters occur rarely in the fractal; they have low fractal 
dimension as seen in the low / value typically associated with low values of a 



Tony Roberts, February 8, 2008 



3 Fractal dimensions unbiased by finite size of data sets 



6 



in the multi-fractal spectrum. Because of this rareness, the computation from 
experimental data of D q for large positive order q is unreliable. Then, con- 
versely, the information dimension weights the data most uniformly, and so 
"knows" most about the fractal, without being overly sensitive to the possi- 
ble occurrence of regions of very low probability. The information dimension 
seems most informative. 

3 Fractal dimensions unbiased by finite size 
of data sets 

Cronin & Roberts PS] proposed a novel method to eliminate biases, caused 
by finite sized data sets, in determining the multi-fractal properties of a given 
data set. Jelenik et al. 0IE] used this method to explore the shape of neuron 
cells. The method compares characteristics of the inter-point distances in the 
data set with those of artificially generated multi-fractals. By maximising 
the likelihood that the characteristics are the same we model the multi- 
fractal nature of the data by the parameters of the artificial multi-fractal. 
By searching among artificial multi-fractals with precisely the same number 
of sample points as in the data, we anticipate that biases due to the finite 
sample size will be statistically the same in the data and in the artificial multi- 
fractals; hence predictions based upon the fitted multi-fractal parameters 
should be unbiased by the finite sample size. 

The method also appears to give a reliable indication of the error in the 
estimates — a very desirable feature as also noted by Judd & Mees [TT]. Most 
importantly for this paper, I generate finite size data sets with specific pa- 
rameters for the following specific multiplicative multi-fractal process. Given 
parameters p G [0, 0.5] and G [0, 0.5] a binary multiplicative multi-fractal 
is generated by the recursive procedure of dividing each interval into two 
halves, then assigning a fraction <p of the points in the interval to a ran- 
dom sub-interval of length p in the left half, and the complementary fraction 
<p' = 1 — <p to a random sub-interval of length p in the right half. Such a 
process has generalised dimension 



and a multi-fractal spectrum f(a) 5>, §4] given parameterically in terms of 
< £ < 1 and £' = 1 - £ as 





(4) 



(q 



l)logp 



./ 



logp 



'log£' 



£log0 + £' 

logp 



' log <p' 



(5) 



a = 
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Here I chose p = 1/3 and = 1/4 and sample the process with N = 100; such 
a multi-fractal forms a finite data set whose parameters we need to estimate 
from the sample. 

As explained in ^H]; we analyse such a sample by probing it with exactly 
the same multiplicative multi-fractal process, and seek the best fit parame- 
ters. Here the resulting estimate of the original parameters is then in error 
only due to the finite size of the sample of the original multi-fractal process. 
Because we fit the data with a process which we know includes the one that 
generated the data (a luxury rare in practise), there is no other error. Thus 
the spread in errors that we see is characteristic of only the errors induced by 
a finite sized sample, nothing else. In particular, observe that the deductions 
of the preceding section are indeed appropriate. 

I repeat the sampling of the multi-fractal followed by a maximum likeli- 
hood estimate of the parameters 16 times. Figure |3] plots the estimates of 
the parameters. Observe that the whole sampling and estimation process 
appears unbiased in that the mean of the predictions is reasonably close to 
the correct values of the parameters. 

Ultimately, experimenters want to examine multi-fractal properties of the 
data. Here these will be determined from the parameters (p, </>) of the best fit 
multi-fractal substituted into analytic expressions such as (JH) and (jSJ). For 
each of the 16 realisations and their best-fit estimates, I plot the correspond- 
ing predicted generalised dimensions D q in Figure 0] (The corresponding 
graphs of the multi-fractal spectra f(a) are plotted in Figure 7 of [TH] along 
with the true f{ct) curve.) Observe that the predicted dimensions for posi- 
tive q (low a) are quite good for all realisations, especially near the informa- 
tion dimension, D\. However, predicted dimensions for negative q (high a) 
are very poor; this is also the case for the Hausdorff dimension D Q (the max- 
imum of the f(a) curve). The negative q predictions are poor despite the 
fitting process "knowing" that there are no very low probability regions in 
this artificial process. In general applications one cannot know this and I 
expect the negative q (large a) predictions to be significantly worse. These 
numerical results convincingly support the arguments of the preceding sec- 
tion that we should use the information dimension, not the Hausdorff. 
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Figure 3: predicted multi- fractal parameters (p, <ft), indicated by o's, from 
the maximum likelihood match to an ensemble of 16 different realisations, 
each of iV = 100 data points, of a binary multiplicative multi-fractal with 
parameters p = 1/3 and <p — 1/4, indicated by +. The mean location of the 
predictions is indicated by a x . 
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Figure 4: ensemble of multi-fractal generalised dimensions D q , dotted, for 
each of the predictions plotted in Figure |3] made from samples of iV = 100 
data points. For comparison the generalised dimensions for the actual fractal 
is plotted as the solid line. Observe the good estimation near the information 
dimension, but the large errors for negative order q. 
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